Translingual Information Retrieval: Learning from Bilingual Corpora (ai Journal Special Issue: Best of Ijcai-97)
نویسندگان
چکیده
Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more diierent languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR approaches establishing translingual associations. The results show that using bilingual corpora for automated extraction of term equivalences in context outperforms dictionary-based methods. Translingual versions of the Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) also perform well, as does translingual pseudo relevance feedback (PRF) and Example-Based Term-in-context Translation (EBT). All showed relatively small performance loss between monolingual and translingual versions, ranging between 87% to 101% of monolingual IR performance. Query translation based on a general machine-readable bilingual dictionary { heretofore the most popular method { did not match the performance of other, more sophisticated methods. Also, the previous very high LSI results in the literature based on \mate-nding"were superseded by more realistic relevance-based evaluations ; LSI performance proved comparable to that of other statistical corpus-based methods.
منابع مشابه
Translingual Information Retrieval: Learning from Bilingual Corpora
Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more diierent languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR appr...
متن کاملTranslingual Information Retrieval: A Comparative Evaluation
Translingual information retrieval TIR con sists of providing a query in one language and searching document collections in one or more di erent languages This paper introduces new TIR methods and reports on comparative TIR experiments with these new methods and with previously reported ones in a realistic setting Methods fall into two categories query trans lation based and statistical IR appr...
متن کاملLearning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach
Recent years saw an increased interest in the use and the construction of large corpora. With this increased interest and awareness has come an expansion in the application to knowledge acquisition and bilingual terminology extraction. The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, combination to linguisticsbased pruning a...
متن کاملBilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval
The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval. A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their...
متن کاملAutomatic extraction of bilingual word pairs using inductive chain learning in various languages
In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficien...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997